Goto

Collaborating Authors

 quality factor


Scaling-up Perceptual Video Quality Assessment

Jia, Ziheng, Zhang, Zicheng, Zhang, Zeyu, Liang, Yingji, Zhu, Xiaorong, Li, Chunyi, Han, Jinliang, Wu, Haoning, Wang, Bin, Zhang, Haoran, Zhu, Guanyu, Zhao, Qiyong, Liu, Xiaohong, Zhai, Guangtao, Min, Xiongkuo

arXiv.org Artificial Intelligence

The data scaling law has been shown to significantly enhance the performance of large multi-modal models (LMMs) across various downstream tasks. However, in the domain of perceptual video quality assessment (VQA), the potential of scaling law remains unprecedented due to the scarcity of labeled resources and the insufficient scale of datasets. To address this, we propose \textbf{OmniVQA}, an efficient framework designed to efficiently build high-quality, human-in-the-loop VQA multi-modal instruction databases (MIDBs). We then scale up to create \textbf{OmniVQA-Chat-400K}, the largest MIDB in the VQA field concurrently. Our focus is on the technical and aesthetic quality dimensions, with abundant in-context instruction data to provide fine-grained VQA knowledge. Additionally, we have built the \textbf{OmniVQA-MOS-20K} dataset to enhance the model's quantitative quality rating capabilities. We then introduce a \textbf{complementary} training strategy that effectively leverages the knowledge from datasets for quality understanding and quality rating tasks. Furthermore, we propose the \textbf{OmniVQA-FG (fine-grain)-Benchmark} to evaluate the fine-grained performance of the models. Our results demonstrate that our models achieve state-of-the-art performance in both quality understanding and rating tasks.


ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws

Li, Ruihang, Wei, Yixuan, Zhang, Miaosen, Yu, Nenghai, Hu, Han, Peng, Houwen

arXiv.org Artificial Intelligence

High-quality data is crucial for the pre-training performance of large language models. Unfortunately, existing quality filtering methods rely on a known high-quality dataset as reference, which can introduce potential bias and compromise diversity. In this paper, we propose ScalingFilter, a novel approach that evaluates text quality based on the perplexity difference between two language models trained on the same data, thereby eliminating the influence of the reference dataset in the filtering process. An theoretical analysis shows that ScalingFilter is equivalent to an inverse utilization of scaling laws. Through training models with 1.3B parameters on the same data source processed by various quality filters, we find ScalingFilter can improve zero-shot performance of pre-trained models in downstream tasks. To assess the bias introduced by quality filtering, we introduce semantic diversity, a metric of utilizing text embedding models for semantic representations. Extensive experiments reveal that semantic diversity is a reliable indicator of dataset diversity, and ScalingFilter achieves an optimal balance between downstream performance and semantic diversity.


Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets

Grommelt, Patrick, Weiss, Louis, Pfreundt, Franz-Josef, Keuper, Janis

arXiv.org Artificial Intelligence

The widespread adoption of generative image models has highlighted the urgent need to detect artificial content, which is a crucial step in combating widespread manipulation and misinformation. Consequently, numerous detectors and associated datasets have emerged. However, many of these datasets inadvertently introduce undesirable biases, thereby impacting the effectiveness and evaluation of detectors. In this paper, we emphasize that many datasets for AI-generated image detection contain biases related to JPEG compression and image size. Using the GenImage dataset, we demonstrate that detectors indeed learn from these undesired factors. Furthermore, we show that removing the named biases substantially increases robustness to JPEG compression and significantly alters the cross-generator performance of evaluated detectors. Specifically, it leads to more than 11 percentage points increase in cross-generator performance for ResNet50 and Swin-T detectors on the GenImage dataset, achieving state-of-the-art results. We provide the dataset and source codes of this paper on the anonymous website: https://www.unbiased-genimage.org


A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges

Junior, Roberto Francisco de Lima, Presta, Luiz Fernando Paes de Barros, Borborema, Lucca Santos, da Silva, Vanderson Nogueira, Dahia, Marcio Leal de Melo, Santos, Anderson Carlos Sousa e

arXiv.org Artificial Intelligence

This paper presents a detailed case study examining the application of Large Language Models (LLMs) in the construction of test cases within the context of software engineering. LLMs, characterized by their advanced natural language processing capabilities, are increasingly garnering attention as tools to automate and enhance various aspects of the software development life cycle. Leveraging a case study methodology, we systematically explore the integration of LLMs in the test case construction process, aiming to shed light on their practical efficacy, challenges encountered, and implications for software quality assurance. The study encompasses the selection of a representative software application, the formulation of test case construction methodologies employing LLMs, and the subsequent evaluation of outcomes. Through a blend of qualitative and quantitative analyses, this study assesses the impact of LLMs on test case comprehensiveness, accuracy, and efficiency. Additionally, delves into challenges such as model interpretability and adaptation to diverse software contexts. The findings from this case study contributes with nuanced insights into the practical utility of LLMs in the domain of test case construction, elucidating their potential benefits and limitations. By addressing real-world scenarios and complexities, this research aims to inform software practitioners and researchers alike about the tangible implications of incorporating LLMs into the software testing landscape, fostering a more comprehensive understanding of their role in optimizing the software development process.


Uncertainty Wrapper in the medical domain: Establishing transparent uncertainty quantification for opaque machine learning models in practice

Jöckel, Lisa, Kläs, Michael, Popp, Georg, Hilger, Nadja, Fricke, Stephan

arXiv.org Machine Learning

When systems use data-based models that are based on machine learning (ML), errors in their results cannot be ruled out. This is particularly critical if it remains unclear to the user how these models arrived at their decisions and if errors can have safety-relevant consequences, as is often the case in the medical field. In such cases, the use of dependable methods to quantify the uncertainty remaining in a result allows the user to make an informed decision about further usage and draw possible conclusions based on a given result. This paper demonstrates the applicability and practical utility of the Uncertainty Wrapper using flow cytometry as an application from the medical field that can benefit from the use of ML models in conjunction with dependable and transparent uncertainty quantification.


Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach

Wu, Haoning, Zhang, Erli, Liao, Liang, Chen, Chaofeng, Hou, Jingwen, Wang, Annan, Sun, Wenxiu, Yan, Qiong, Lin, Weisi

arXiv.org Artificial Intelligence

The proliferation of in-the-wild videos has greatly expanded the Video Quality Assessment (VQA) problem. Unlike early definitions that usually focus on limited distortion types, VQA on in-the-wild videos is especially challenging as it could be affected by complicated factors, including various distortions and diverse contents. Though subjective studies have collected overall quality scores for these videos, how the abstract quality scores relate with specific factors is still obscure, hindering VQA methods from more concrete quality evaluations (e.g. sharpness of a video). To solve this problem, we collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors, including in-capture authentic distortions (e.g. motion blur, noise, flicker), errors introduced by compression and transmission, and higher-level experiences on semantic contents and aesthetic issues (e.g. composition, camera trajectory), to establish the multi-dimensional Maxwell database. Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension. These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings, and to benchmark different categories of VQA algorithms on each dimension, so as to more comprehensively analyze their strengths and weaknesses. Furthermore, we propose the MaxVQA, a language-prompted VQA approach that modifies vision-language foundation model CLIP to better capture important quality issues as observed in our analyses. The MaxVQA can jointly evaluate various specific quality factors and final quality scores with state-of-the-art accuracy on all dimensions, and superb generalization ability on existing datasets. Code and data available at https://github.com/VQAssessment/MaxVQA.


AI Could Learn a Thing or Two from These Three Fields

#artificialintelligence

It's no secret that the field of Artificial Intelligence (AI) is fraught with scandals, biases and limitations. There's also no shortage of attempts to fix these issues, whether coming from tech, mathematics, ethics, or even design. What's becoming more and more clear is that there will never be a one-size-fits all solution to these problems, and instead of trying to reinvent the wheel, the field could benefit greatly from taking advantage of existing movements and trends moving towards human-centeredness and inclusivity. In a broad sense, participation means to take part in something. One of the important benefits of stakeholder participation in the field of AI is to more evenly distribute the power of decision-making and having an influential voice among the parties affected by a technology or intervention and especially those experiencing "structural oppression" or "systemic disadvantages".


SQuAP-Ont: an Ontology of Software Quality Relational Factors from Financial Systems

Ciancarini, Paolo, Nuzzolese, Andrea Giovanni, Presutti, Valentina, Russo, Daniel

arXiv.org Artificial Intelligence

Quality, architecture, and process are considered the keystones of software engineering. ISO defines them in three separate standards. However, their interaction has been scarcely studied, so far. The SQuAP model (Software Quality, Architecture, Process) describes twenty-eight main factors that impact on software quality in banking systems, and each factor is described as a relation among some characteristics from the three ISO standards. Hence, SQuAP makes such relations emerge rigorously, although informally. In this paper, we present SQuAP-Ont, an OWL ontology designed by following a well-established methodology based on the reuse of Ontology Design Patterns (i.e. SQuAP-Ont formalises the relations emerging from SQuAP to represent and reason via Linked Data about software engineering in a three-dimensional model consisting of quality, architecture, and process ISO characteristics. Industrial standards are widely used in the software engineering practice: they are built on preexisting literature and provide a common ground to scholars and practitioners to analyze, develop, and assess software systems. As far as software quality is concerned, the reference standard is the ISO/IEC 25010:2011 (ISO quality from now on), which defines the quality of software products and their usage (i.e., in-use quality). The ISO quality standard introduces eight characteristics that qualify a software product, and five characteristics that assess its quality in use. A characteristic is a parameter for measuring the quality of a software system-related aspect, e.g., reliability, usability, performance efficiency. The quantitative value associated with a characteristic is measured employing metrics that are dependent on the context of a specific software project and defined following the established literature.


Particle identification in ground-based gamma-ray astronomy using convolutional neural networks

Postnikov, E. B., Bychkov, I. V., Dubenskaya, J. Y., Fedorov, O. L., Kazarina, Y. A., Korosteleva, E. E., Kryukov, A. P., Mikhailov, A. A., Nguyen, M. D., Polyakov, S. P., Shigarov, A. O., Shipilov, D. A., Zhurov, D. P.

arXiv.org Machine Learning

Modern detectors of cosmic gamma-rays are a special type of imaging telescopes (air Cherenkov telescopes) supplied with cameras with a relatively large number of photomultiplier-based pixels. For example, the camera of the TAIGA-IACT telescope has 560 pixels of hexagonal structure. Images in such cameras can be analysed by deep learning techniques to extract numerous physical and geometrical parameters and/or for incoming particle identification. The most powerful deep learning technique for image analysis, the so-called convolutional neural network (CNN), was implemented in this study. Two open source libraries for machine learning, PyTorch and TensorFlow, were tested as possible software platforms for particle identification in imaging air Cherenkov telescopes. Monte Carlo simulation was performed to analyse images of gamma-rays and background particles (protons) as well as estimate identification accuracy. Further steps of implementation and improvement of this technique are discussed.


CAS-CNN: A Deep Convolutional Neural Network for Image Compression Artifact Suppression

Cavigelli, Lukas, Hager, Pascal, Benini, Luca

arXiv.org Artificial Intelligence

Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media. However, we pay for their high compression rate with visual artifacts degrading the user experience. Deep convolutional neural networks have become a widespread tool to address high-level computer vision tasks very successfully. Recently, they have found their way into the areas of low-level computer vision and image processing to solve regression problems mostly with relatively shallow networks. We present a novel 12-layer deep convolutional network for image compression artifact suppression with hierarchical skip connections and a multi-scale loss function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an improvement of up to 0.36 dB over the best previous ConvNet result. We show that a network trained for a specific quality factor (QF) is resilient to the QF used to compress the input image - a single network trained for QF 60 provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.